Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@codinghorror@infosec.exchange
2024-03-08 00:13:23

To me, the most direct genetic DNA line between Blur and Gorillaz is this song music.youtube.com/watch?v=1YXc

@karlauerbach@sfba.social
2024-03-19 04:21:09

Today in the arguments in Murthy v Missouri non-justice Thomas asked what is the dividing line between unlawful gov't coercion of social media and lawful acts of persuasion.
Well, the answer was there to be answered but apparently no one did: The answer is to pull up the standard that SCOTUS used in McDonnell v. United States where they said that a bribe, to be a bribe, had to be very directly tied to a quid-pro-quo.
Well, that same standard could be used with regard to gov…

@arXiv_eessSP_bot@mastoxiv.page
2024-04-01 07:25:56

First path component power based NLOS mitigation in UWB positioning system
Marcin Kolakowski, Jozef Modelski
arxiv.org/abs/2403.19706

@ukraine_live_tagesschau@mastodon.social
2024-02-16 03:10:48

Selenskyj will Getreidestreit mit Polen beenden
Angesichts der zunehmend schwierigen Lage seines Landes wegen des russischen Angriffskrieges dringt der ukrainische Präsident Wolodymyr Selenskyj auf die sofortige Beilegung des Gereidestreits mit Polen. Er habe seine Regierung angewiesen, darüber schnellstens mit dem polnischen Regierungschef Donald Tusk zu verhandeln, sagte Selenskyj in einer Video-Botschaft. Die beiden Nachbar…
📑

@arXiv_csCL_bot@mastoxiv.page
2024-05-01 06:49:10

Iterative Reasoning Preference Optimization
Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston
arxiv.org/abs/2404.19733 arxiv.org/pdf/2404.19733
arXiv:2404.19733v1 Announce Type: new
Abstract: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. We train using a modified DPO loss (Rafailov et al., 2023) with an additional negative log-likelihood term, which we find to be crucial. We show reasoning improves across repeated iterations of this scheme. While only relying on examples in the training set, our approach results in increasing accuracy for Llama-2-70B-Chat from 55.6% to 81.6% on GSM8K (and 88.7% with majority voting out of 32 samples), from 12.5% to 20.8% on MATH, and from 77.8% to 86.7% on ARC-Challenge, which outperforms other Llama-2-based models not relying on additionally sourced datasets.

@arXiv_csIR_bot@mastoxiv.page
2024-02-28 08:30:44

This arxiv.org/abs/2309.08541 has been replaced.
initial toot: mastoxiv.page/@arXiv_csIR_…

@arXiv_csCL_bot@mastoxiv.page
2024-03-29 08:32:00

This arxiv.org/abs/2403.17752 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCL_…

@arXiv_mathLO_bot@mastoxiv.page
2024-03-18 08:38:51

This arxiv.org/abs/2401.01979 has been replaced.
initial toot: mastoxiv.page/@arXiv_mat…

@arXiv_eessSP_bot@mastoxiv.page
2024-03-29 07:22:16

Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning
Dieter Coppens, Ben Van Herbruggen, Adnan Shahid, Eli De Poorter
arxiv.org/abs/2403.19262

@arXiv_mathLO_bot@mastoxiv.page
2024-03-18 08:38:51

This arxiv.org/abs/2401.01979 has been replaced.
initial toot: mastoxiv.page/@arXiv_mat…